{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RE模块字符串操作 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 使用match方法从开始处匹配字符串,if sucess return Match objection else return None\n",
    "\n",
    "`re.match(pattern,string,flags)`\n",
    "\n",
    "* 使用search方法在整个字符串中匹配子字符串,if sucess return Match objection else return None\n",
    "\n",
    "`re.search(pattern,string,flags)`\n",
    "\n",
    "* 使用findall方法匹配所有符合expression的字符串,if succss return Match-list else return None-list\n",
    "\n",
    "`re.findall(pattern,string,flags)`\n",
    "\n",
    ">参数说明\n",
    "\n",
    "1.pattern:匹配模板\n",
    "\n",
    "2.string:匹配字符串\n",
    "\n",
    "3.flags:可选参数,用于控制匹配方式,如是否区分大小写\n",
    "\n",
    "* 使用sub方法替换字符串\n",
    "\n",
    "`re.sub(pattern,repl,string,count,flag)`\n",
    "\n",
    ">参数说明\n",
    "\n",
    "1.pattern:匹配模板\n",
    "\n",
    "2.repl:表示替换成的字符串\n",
    "\n",
    "3.string:表示被替换的字符串\n",
    "\n",
    "3.count:表示替换最大次数\n",
    "\n",
    "* 使用split方法分割字符串\n",
    "\n",
    "`re.split(pattern,string,[maxsplit],[flags])`\n",
    "\n",
    ">参数说明\n",
    "\n",
    "1.string:分割字符串\n",
    "\n",
    "2.maxplit:最大分割次数\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<_sre.SRE_Match object; span=(0, 6), match='MR_one'>\n",
      "MR_one mr_two\n",
      "0\n",
      "6\n",
      "(0, 6)\n",
      "MR_one\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "pattern = r'mr_\\w+'\n",
    "string = 'MR_one mr_two'\n",
    "match = re.match(pattern,string,re.I)\n",
    "print(match)\n",
    "print(match.string)   #提取要匹配的字符串\n",
    "print(match.start())    #匹配起始位置\n",
    "print(match.end())      #匹配结尾位置\n",
    "print(match.span())     #匹配起始结尾位置元组\n",
    "print(match.group())   #匹配的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "321  Mr_luo\n",
      "Mr_luo\n"
     ]
    }
   ],
   "source": [
    "string = '321  Mr_luo'\n",
    "search = re.search(pattern,string,re.I)\n",
    "print(search.string)\n",
    "print(search.group())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['123', '123', '123', '123']\n"
     ]
    }
   ],
   "source": [
    "string = 'dsadsa 123 dsad 123 dsa 123 dsa 123'\n",
    "pattern = '[0-9]{1,3}'\n",
    "findall = re.findall(pattern,string)\n",
    "print(findall)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "替换之前: dsadsa 123 dsad 123 dsa 123 dsa 123\n",
      "替换之后: dsadsa SSS dsad SSS dsa SSS dsa SSS\n"
     ]
    }
   ],
   "source": [
    "pattern = '[0-9]{1,3}'\n",
    "repl = 'SSS'\n",
    "print('替换之前:',string)\n",
    "sub = re.sub(pattern,repl,string)\n",
    "print('替换之后:',sub)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "分割之前: dsadsa SSS dsad SSS dsa SSS dsa SSS\n",
      "分割之后: ['d', '', 'd', '', ' ', '', '', ' d', '', 'd ', '', '', ' d', '', ' ', '', '', ' d', '', ' ', '', '', '']\n"
     ]
    }
   ],
   "source": [
    "print('分割之前:',sub)\n",
    "pattern = '[a|s]'\n",
    "split = re.split(pattern,sub,0,re.I)\n",
    "print('分割之后:',split)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['.1', '.66']\n",
      "['.1', '.1']\n",
      "[('193.168.1', '.1'), ('192.168.1', '.1')]\n",
      "[('193.168.1.1', '.1'), ('192.168.1.66', '.66')]\n"
     ]
    }
   ],
   "source": [
    "# 匹配IP地址\n",
    "import re\n",
    "pattern1 = r'[1-9]{1,3}(\\.[0-9]{1,3}){3}'\n",
    "pattern2 = r'[1-9]{1,3}(\\.[0-9]{1,3}){2}'\n",
    "pattern3 = r'([1-9]{1,3}(\\.[0-9]{1,3}){2})'\n",
    "pattern4 = r'([1-9]{1,3}(\\.[0-9]{1,3}){3})'\n",
    "\n",
    "string = '193.168.1.1 192.168.1.66'\n",
    "findall1 = re.findall(pattern1,string)\n",
    "print(findall1)\n",
    "findall2 = re.findall(pattern2,string)\n",
    "print(findall2)\n",
    "findall3 = re.findall(pattern3,string)\n",
    "print(findall3)\n",
    "findall4 = re.findall(pattern4,string)\n",
    "print(findall4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "193.168.1.1\n",
      "192.168.1.66\n"
     ]
    }
   ],
   "source": [
    "for item in findall4:\n",
    "    print(item[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 由上面的例子可以看到当分组里面套了分组的时候程序是如何运行的\n",
    "pattern3 = r'([1-9]{1,3}(\\.[0-9]{1,3}){2})'\n",
    "pattern4 = r'([1-9]{1,3}(\\.[0-9]{1,3}){3})'\n",
    "如这两个表达式在匹配的时候先忽略内层括号,考虑最外面一层()进行匹配,然后在匹配内层括号内的,然后把内层与外层的匹配值合并成一组\n",
    "(\\.[0-9]{1,3}){2})\n",
    "(\\.[0-9]{1,3}){3})\n",
    "这两个表达式的意思是重复匹配表达式(\\.[0-9]{1,3}),2、3次且取最后一次匹配值\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [Anacond1]",
   "language": "python",
   "name": "Python [Anacond1]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
